Skip to content

Comments

Handle record/item/row deletions#265

Draft
jthompson-arcus wants to merge 13 commits intodevfrom
jt-handle_record_deletions
Draft

Handle record/item/row deletions#265
jthompson-arcus wants to merge 13 commits intodevfrom
jt-handle_record_deletions

Conversation

@jthompson-arcus
Copy link
Collaborator

Few Notes

Okay, workflow here is run db_upgrade(db_path) and then db_update() should work as expected.

The function db_update() won't run if the database version between the SQLite file and the R package don't align.

New databases will be initialized at the correct version.

Summary of Changes

If a record/item/row is no longer in the merged data set, it will be deleted from all_review_data. A trigger has been added to log these deleted records to the all_review_data_log table with a DML type of 'DELETE'.

Comment on lines 247 to 262
db_upgrade <- function(db_path){
stopifnot(file.exists(db_path))

current_db_version <- db_get_version(db_path)
if (identical(current_db_version, db_version))
return(paste("Upgraded to version", db_version))

switch(
current_db_version,
"1.1" = db_temp_connect(db_path, {
create_delete_log_trigger(con)
DBI::dbWriteTable(con, "db_version", data.frame(version = "1.2"), overwrite = TRUE)
}),
stop("No upgrade available for version ", current_db_version)
)
db_upgrade(db_path)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach to the database upgrading issue. By using recursion, we can essentially just add a step to the upgrade process. Once the database is fully upgraded to the expected value based on the version of ClinSight itself, it will return a success.

I also added a version check in db_update(). Don't know if there are other places versioning should be checked.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this approach. I guess this needs to be rewritten to a loop if there are multiple database versions but that can be done once needed.

You could also think of asking user confirmation of the update if one is found, something like this: utils::menu(c("Yes", "No"), title = "Update available. Do you really want to proceed?")

Version is also checked in run_app()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this needs to be rewritten to a loop if there are multiple database versions but that can be done once needed.

Actually this approach doesn't need a loop because it is using recursion (calls itself at the end dp_upgrade(dp_path)). Say the release after this is version 1.3. We would simply add "1.2" to the switch() and the function would iterate based on the version. This allows us to focus on the changes needed to go from version 1.2 to 1.3 and ignore 1.1 to 1.3.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also think of asking user confirmation of the update if one is found, something like this: utils::menu(c("Yes", "No"), title = "Update available. Do you really want to proceed?")

Sure. I think that makes sense.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version is also checked in run_app()

Yes, I added checks to db_update() because it is exported and can be ran outside of run_app().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was updated by running db_upgrade("tests/testthat/fixtures/testdb.sqlite")

@jthompson-arcus
Copy link
Collaborator Author

@LDSamson I need to add some more to this PR (tests and internal docs), but I wanted to get your initial impression of this approach.

Copy link
Collaborator

@LDSamson LDSamson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few initial comments, see below.

Comment on lines 247 to 262
db_upgrade <- function(db_path){
stopifnot(file.exists(db_path))

current_db_version <- db_get_version(db_path)
if (identical(current_db_version, db_version))
return(paste("Upgraded to version", db_version))

switch(
current_db_version,
"1.1" = db_temp_connect(db_path, {
create_delete_log_trigger(con)
DBI::dbWriteTable(con, "db_version", data.frame(version = "1.2"), overwrite = TRUE)
}),
stop("No upgrade available for version ", current_db_version)
)
db_upgrade(db_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this approach. I guess this needs to be rewritten to a loop if there are multiple database versions but that can be done once needed.

You could also think of asking user confirmation of the update if one is found, something like this: utils::menu(c("Yes", "No"), title = "Update available. Do you really want to proceed?")

Version is also checked in run_app()

#' @return A data frame containing only the rows to remove from the review data.
#'
#' @keywords internal
delete_review_data <- function(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function name is a bit confusing because it is not deleting review data but rather returning the rows to be deleted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I was mimicking update_review_data() below which also doesn't actually do what it's called and simply just returns the data to update.

Comment on lines +324 to +331
cat("Check for deleted rows\n")
deleted_review_data <- delete_review_data(
review_df = review_data,
latest_review_data = data,
key_cols = key_cols
)
cat("logging deleted review data to database...\n")
db_delete(con, deleted_review_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to hard-force a DB update/ this DB version or can we run this part on the condition that the DB is at least version 1.2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think things will get messy fast if we allow desynching between the DB version and the ClinSight version. Also, we already have a hard version check in run_app().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Desynching should be removed as soon as it becomes messy of course. Not more than a minor version desynch would not be a problem I think? We can remove desynching at the next DB update.

It would be easier to update clinsight in running studies. With that desynch the dev version is still a drop-in replacement for the current v0.3.0 version, no breaking changes.

Comment on lines +325 to +331
deleted_review_data <- delete_review_data(
review_df = review_data,
latest_review_data = data,
key_cols = key_cols
)
cat("logging deleted review data to database...\n")
db_delete(con, deleted_review_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe to run automatically with each update?

What if this goes wrong and the wrong lines are selected for deletion? Can this be restored somehow easily? Or is that difficult?

For example: what if you use an API to pull data and this data pull malfunctions and returns either zero rows, or not enough rows and you run db_update, do you loose all your review data then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deleted rows are located in the logging table. The situation you outlined above seems outside of this process though. I would hope that a check would be in place to verify data before pushing to ClinSight. And even if a push happened, I would hope someone would be creating DB backups for rollback if needed.

Copy link
Collaborator Author

@jthompson-arcus jthompson-arcus Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More specifically on the restoration side, the old record is stored in the logging table. Those records could be retrieved with their review status if needed.

Above statement is inaccurate. The key columns are not being stored in the logging table, only the ID. Their values are truly lost in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The db_update() function is meant to run automatically and its behaviour should not be destructive, I think. At least any potential destructive behaviour should be optional.

@jthompson-arcus
Copy link
Collaborator Author

@LDSamson on further reflection I think I understand the concern. If a record gets removed and then later reappears, the review data should see these as the same record and not two different ones. Is this a good summary?

@LDSamson
Copy link
Collaborator

@LDSamson on further reflection I think I understand the concern. If a record gets removed and then later reappears, the review data should see these as the same record and not two different ones. Is this a good summary?

Yes that would be ideal indeed. It should not be a destructive action to the DB.
We could also change the workflow to not be an automatic deletion, but rather a manual function that you run interactively with for example a message ('these rows will be deleted. Are you sure?'). Even then it would be nice if it is non-destructive but at least you can instruct the user to make a backup first.

@aclark02-arcus
Copy link
Collaborator

aclark02-arcus commented Feb 4, 2026

Note, opened #270 so we have an issue to track for this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants